Recursive nearest agglomeration (ReNA): fast clustering for approximation of structured signals

نویسندگان

  • Andres Hoyos Idrobo
  • Gaël Varoquaux
  • Jonas Kahn
  • Bertrand Thirion
چکیده

In this work, we revisit fast dimension reduction approaches, as with random projections and random sampling. Our goal is to summarize the data to decrease computational costs and memory footprint of subsequent analysis. Such dimension reduction can be very efficient when the signals of interest have a strong structure, such as with images. We focus on this setting and investigate feature clustering schemes for data reductions that capture this structure. An impediment to fast dimension reduction is that good clustering comes with large algorithmic costs. We address it by contributing a linear-time agglomerative clustering scheme, Recursive Nearest Agglomeration (ReNA). Unlike existing fast agglomerative schemes, it avoids the creation of giant clusters. We empirically validate that it approximates the data as well as traditional variance-minimizing clustering schemes that have a quadratic complexity. In addition, we analyze signal approximation with feature clustering and show that it can remove noise, improving subsequent analysis steps. As a consequence, data reduction by clustering features with ReNA yields very fast and accurate models, enabling to process large datasets on budget. Our theoretical analysis is backed by extensive experiments on publicly-available data that illustrate the computation efficiency and the denoising properties of the resulting dimension reduction scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensembles of models in fMRI : stable learning in large-scale settings. (Ensembles des modeles en fMRI : l'apprentissage stable à grande échelle)

In medical imaging, collaborative worldwide initiatives have begun the acquisition of hundreds of Terabytes of data that are made available to the scienti c community. In particular, functional Magnetic Resonance Imaging –fMRI– data. However, this signal requires extensive tting and noise reduction steps to extract useful information. The complexity of these analysis pipelines yields results th...

متن کامل

A Survey on Efficient Clustering Methods with Effective Pruning Techniques for Probabilistic Graphs

This paper provides a survey on K-NN queries, DCR query, agglomerative complete linkage clustering and Extension of edit-distance-based definition graph algorithm and solving decision problems under uncertainty. This existing system give an beginning to Graph agglomeration aims to divide information into clusters per their similarities, and variety of algorithms are planned for agglomeration gr...

متن کامل

A Survey on Efficient Clustering Methods with Effective Pruning Techniques for Probabilistic Graphs

This paper provides a survey on K-NN queries, DCR query, agglomerative complete linkage clustering and Extension of edit-distance-based definition graph algorithm and solving decision problems under uncertainty. This existing system give an beginning to Graph agglomeration aims to divide information into clusters per their similarities, and variety of algorithms are planned for agglomeration gr...

متن کامل

A Survey on Efficient Clustering Methods with Effective Pruning Techniques for Probabilistic Graphs

This paper provides a survey on K-NN queries, DCR query, agglomerative complete linkage clustering and Extension of edit-distance-based definition graph algorithm and solving decision problems under uncertainty. This existing system give an beginning to Graph agglomeration aims to divide information into clusters per their similarities, and variety of algorithms are planned for agglomeration gr...

متن کامل

A Survey on Efficient Clustering Methods with Effective Pruning Techniques for Probabilistic Graphs

This paper provides a survey on K-NN queries, DCR query, agglomerative complete linkage clustering and Extension of edit-distance-based definition graph algorithm and solving decision problems under uncertainty. This existing system give an beginning to Graph agglomeration aims to divide information into clusters per their similarities, and variety of algorithms are planned for agglomeration gr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1609.04608  شماره 

صفحات  -

تاریخ انتشار 2016